Search CORE

19 research outputs found

Intégration des plongements de mots dans les méthodes, supervisées et non supervisées, d'extraction automatique de mots clés

Author: Hary Razakasoa
Michel Rajoelina
Mothe Josiane
Ramiandrisoa Faneva
Publication venue: Veille Stratégique Scientifique et Technologique(VSST)
Publication date: 01/01/2018
Field of study

Le plongement de mots a été utilisé avec succès dans diverses applications dans les domaines de traitement de langue et de recherche d’information. Ce papier vise à analyser l’impact de l’intégration des plongements de mots dans les méthodes supervisées et non supervisées d’extraction automatique de mots clés. Les méthodes à base de graphe pour les méthodes non supervisées et les méthodes à base d’ensemble d’arbres de décision pour les méthodes supervisées sont très utilisées et étudiées compte tenu de leurs performances;nous nous concentrons donc sur celles-ci.Nous avons considéré Word2Vec [24],une méthode de plongement de mots et nous avons évalué l’impact de l’intégration du plongement de mots sur deux jeux de données qui sont des références dans la littérature.Nous avons montré qu’il n’y a pas de différence significative dans les résultats quand nous intégrons le plongement de mots dans les méthodes non supervisées à base de graphe. Pour les méthodes supervisées à base d’ensemble d’arbres de décision,l’intégration du plongement de mots améliore significativement les résultats pour trois des quatre méthodes que nous avons testées. Cet article est une extension des articles [25, 26] qui ne s’intéressaient qu’aux méthodes non supervisées

Open Archive Toulouse Archive Ouverte

A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

Author: Aroyehun Segun Taofeek
Arroyo-Fernández Ignacio
Fortuna Paula
François Chollet
Galery Thiago
Golem Viktor
Hutto Clayton J
Kingma Diederik P
Kumar Ritesh
Kumar Ritesh
Kumar Ritesh
Kumar Ritesh
Madisetty Sreekanth
Majumder Prasenjit
Mikolov Tomas
Orabi Ahmed Husseini
Orasan Constantin
Pawlikowski Maciej
Ramiandrisoa Faneva
Risch Julian
Tieleman Tijmen
Tommasel Antonela
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/01/2020
Field of study

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.Comment: 10 pages, 5 Figures, 6 Tables, accepted at CoDS-COMAD 202

arXiv.org e-Print Archive

Crossref

Aggression Identification in Posts - two machine learning approaches

Author: Ramiandrisoa Faneva
Publication venue: CEUR
Publication date: 27/02/2020
Field of study

International audienceSocial media have changed the way people communicate. One of the aspects is cyber-aggression and interpersonal aggression that can be catalyzed by perceived anonymity. Automatically monitoring user-generated content in order to help moderating it is thus a hot topic. In this paper, we present and evaluate two supervised machine learning models to identify aggressive content and the level of aggressiveness. The first model uses random forest and linear regression while the second model uses deep learning techniques

Scientific Publications of the University of Toulouse II Le Mirail

Aggression Identification in Social Media: a Transfer Learning Based Approach

Author: Mothe Josiane
Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 11/05/2020
Field of study

International audienceThe way people communicate have changed in many ways with the outbreak of social media. One of the aspects of social media is the ability for their information producers to hide, fully or partially, their identity during a discussion; leading to cyber-aggression and interpersonal aggression. Automatically monitoring user-generated content in order to help moderating it is thus a very hot topic. In this paper, we propose to use the transformer based language model BERT (Bidirectional Encoder Representation from Transformer) (Devlin et al., 2019) to identify aggressive content. Our model is also used to predict the level of aggressiveness. The evaluation part of this paper is based on the dataset provided by the TRAC shared task (Kumar et al., 2018a). When compared to the other participants of this shared task, our model achieved the third best performance according to the weighted F1 measure on both Facebook and Twitter collections

Scientific Publications of the University of Toulouse II Le Mirail

Détection de la dépression au plus tôt sur les réseaux sociaux (Journée IRIT-SIG, Toulouse, 19/01/18)

Author: Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 19/01/2018
Field of study

International audienc

Scientific Publications of the University of Toulouse II Le Mirail

IRIT at TRAC 2020

Author: Mothe Josiane
Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 11/05/2020
Field of study

International audienceThis paper describes the participation of the IRIT team in the TRAC (Trolling, Aggression and Cyberbullying) 2020 shared task (Bhattacharya et al., 2020) on Aggression Identification and more precisely to the shared task in English language. The shared task was further divided into two sub-tasks: (a) aggression identification and (b) misogynistic aggression identification. We proposed to use the transformer based language model BERT (Bidirectional Encoder Representation from Transformer) for the two sub-tasks. Our team was qualified as twelfth out of sixteen participants on sub-task (a) and eleventh out of fifteen participants on sub-task (b)

Scientific Publications of the University of Toulouse II Le Mirail

Extraction automatique de termes-clés : Comparaison de méthodes non supervisées

Author: Mothe Josiane
Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienceCet article présente un état de l'art et une comparaison des méthodes non supervisées et automatiques d'extraction de mots-clés à partir des contenus textuels de documents. Nous évaluons plusieurs méthodes de la littérature sur deux corpus de documents en comparant des termes-clés extraits et ceux associés initialement aux documents. Nous avons pu constater que la méthode basée sur une mesure TF-IDF était celle qui renvoie les résultats les plus proches des mots-clés des auteurs des documents

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Profil utilisateur dans les réseaux sociaux : État de l'art

Author: Mothe Josiane
Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 01/01/2017
Field of study

International audienceLes réseaux sociaux sont de plus en plus utilisés; les utilisateurs y échangent des informations et fournissent des éléments sur leur profil. Ces données peuvent être utilisées pour modéliser un individu selon les activités qu'il réalise sur le réseau social: il s'agit du profil de l' utilisateur. Ces profils peuvent ensuite être analysés et exploités selon le domaine d'application. Par exemple, les publicités proposées sur les réseaux sociaux sont différentes pour chaque utilisateur selon les préférences de ces derniers. Cet article correspond à un état de l'art sur les informations constituant un profil utilisateur ainsi que sur la représentation des profils. Nous indiquerons également les pistes de recherche pour nos travaux dans ce domaine

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Early Detection of Depression and Anorexia from Social Media: A Machine Learning Approach

Author: Mothe Josiane
Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 06/07/2020
Field of study

International audienceIn this paper, we present an approach on social media mining to help early detection of two mental illnesses: depression and anorexia. We aim at detecting users that are likely to be ill, by learning from annotated examples. We mine texts to extract features for text representation and also use word embedding representation. The machine learning based model we proposed uses these two types of text representation to predict the likelihood of each user to be ill. We use 58 features from state of the art and 198 features new in this domain that are part of our contribution. We evaluate our model on the CLEF eRisk 2018 reference collections. For depression detection, our model based on word embedding achieves the best performance according to the measure ERDE 50 and the model based on features only achieves the best performance according to precision. For anorexia detection, the model based on word embedding achieves the second-best results on ERDE 50 and recall. We also observed that many of the new features we added contribute to improve the results

Scientific Publications of the University of Toulouse II Le Mirail

IRIT at TRAC 2018

Author: Mothe Josiane
Ramiandrisoa Faneva
Publication venue: HAL CCSD
Publication date: 01/01/2018
Field of study

International audienceThis paper describes the participation of the IRIT team to the TRAC 2018 shared task on Aggres-sion Identification and more precisely to the shared task in English language. The three followingmethods have been used: a) a combination of machine learning techniques that relies on a setof features and document/text vectorization, b) Convolutional Neural Network (CNN) and c) acombination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM).Best results were obtained when using the method (a) on the English test data from Facebookwhich ranked our method sixteenth out of thirty teams, and the method (c) on the English testdata from other social media, where we obtained the fifteenth rank out of thirty

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte